Privacy-preserving Clustering of Data Streams
نویسندگان
چکیده
As most previous studies on privacy-preserving data mining placed specific importance on the security of massive amounts of data from a static database, consequently data undergoing privacy-preservation often leads to a decline in the accuracy of mining results. Furthermore, following by the rapid advancement of Internet and telecommunication technology, subsequently data types have transformed from traditional static data into data streams with consecutive, rapid, temporal, and unpredictable properties. Due to the increase of such data types, traditional privacy-preserving data mining algorithms requiring complex calculation are no longer applicable. As a result, this paper has proposed a method of Privacy-Preserving Clustering of Data Streams (PPCDS) to improve data stream mining procedures while concurrently preserving privacy with a high degree of mining accuracy. PPCDS is mainly composed of two phases: Rotation-Based Perturbation and cluster mining. In the phase of data rotating perturbation phase, a rotation transformation matrix is applied to rapidly perturb the data streams in order to preserve data privacy. In the cluster mining phase, perturbed data will first establish a micro-cluster through optimization of cluster centers, then applying statistical calculation to update a micro-cluster, as well as using geometric time frame to allocate and store a micro-cluster, and finally output mining result through a macro-cluster generation. Two simple data structure are added in the macro-cluster generation process to avoid recalculating the distance between the macro-point and the cluster center in the generation process. This process reduces the repeated calculation time in order to enhance mining efficiency without losing mining
منابع مشابه
Principal Component Analysis Based Transformation for Privacy Preserving in Data Stream Mining
Data stream can be conceived as a continuous and changing sequence of data that continuously arrive at a system to store or process. Examples of data streams include computer network traffic, phone conversations, web searches and sensor data etc. The data owners or publishers may not be willing to exactly reveal the true values of their data due to various reasons, most notably privacy consider...
متن کاملRepeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملPrivacy-preserving Classification of Data Streams
Data mining is the information technology that extracts valuable knowledge from large amounts of data. Due to the emergence of data streams as a new type of data, data streams mining has recently become a very important and popular research issue. There have been many studies proposing efficient mining algorithms for data streams. On the other hand, data mining can cause a great threat to data ...
متن کاملGeometric Data Perturbation Techniques in Privacy Preserving On Data Stream Mining
Data mining is the information technology that extracts valuable knowledge from large amounts of data. Due to the emergence of data streams as a new type of data, data stream mining has recently become a very important and popular research issue. Privacy preservation issue of data streams mining is very important issue, in this dissertation work, an approach based on Geometric data perturbation...
متن کاملPrivacy Preserving Data Stream Classification Using Data Perturbation Techniques
Data stream can be conceived as a continuous and changing sequence of data that continuously arrive at a system to store or process. Examples of data streams include computer network traffic, phone conversations, web searches and sensor data etc. These data sets need to be analyzed for identifying trends and patterns, which help us in isolating anomalies and predicting future behavior. However,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009